Motifs in Triadic Random Graphs based on Steiner Triple Systems 
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Conventionally, pairwise relationships between nodes are considered to be the fundamental build- 
ing blocks of complex networks. However, over the last decade the overabundance of certain sub- 
network patterns, so called motifs, has attracted high attention. It has been hypothesized, these 
motifs, instead of links, serve as the building blocks of network structures. 

Although the relation between a network's topology and the general properties of the system, 
such as its function, its robustness against perturbations, or its efficiency in spreading information 
is the central theme of network science, there is still a lack of sound generative models needed for 
testing the functional role of subgraph motifs. Our work aims to overcome this limitation. 

We employ the framework of exponential random graphs (ERGMs) to define novel models based on 
triadic substructures. The fact that only a small portion of triads can actually be set independently 
poses a challenge for the formulation of such models. To overcome this obstacle we use Steiner 
Triple Systems (STS). These are partitions of sets of nodes into pair-disjoint triads, which thus can 
be specified independently. Combining the concepts of ERGMs and STS, we suggest novel generative 
models capable of generating ensembles of networks with non-trivial triadic Z-score profiles. Further, 
we discover inevitable correlations between the abundance of triad patterns, which occur solely for 
statistical reasons and need to be taken into account when discussing the functional implications 
of motif statistics. Moreover, we calculate the degree distributions of our triadic random graphs 
analytically. 



I. INTRODUCTION 

The topological structure of interactions among the 
constituents of complex many particle systems is inti- 
mately linked to system function and global system prop- 
erties. The study of complex networks aims to elucidate 
this link between structure and function. 

Motivated by the stark contrast between topological 
features found in real-world data and expectations based 
on the assumption of purely random link formation [HIS], 
two main threads of research can be identified. 

The first thread is aimed predominantly at explaining 
the network formation process, i.e. identifying the forces 
shaping a network. A particularly productive approach 
has been the development of network growth models fol- 
lowing the publication of Barabasi and Albert [3] to ex- 
plain non-Poissonian degree distibutions. See [^E] and 
the references therein for a review. Growth models gen- 
erally take the agreement between a particular feature in 
real-world data with networks resulting from a particu- 
lar model as evidence for a particular aspect of a growth 
process, such as preferential attachment. 

The second thread of research focusses on explaining 
the infiuence certain topological features may have on 
global system properties such as the robustness against 
perturbations or the stability of the system under node 
or link removal [7, as well as on dynamical processes 
taking place on the network 0. In order to study such 
questions systematically, the ability to generate an en- 
semble of networks with a precise set of topological fea- 
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tures, but no others, is crucial. Growth models are gen- 
erally not suited for this task since a network formation 
process often introduces invariable correlations between 
network features that are difficult to disentangle. For 
example, the Barabasi-Albert model is capable of gen- 
erating networks with a broad degree distribution, but 
at the same time introduces degree-degree correlations. 
Further, growth models are generally very difficult to 
characterize in terms of their statistical properties. In 
contrast, generative probabilistic models which parame- 
terize an ensemble of networks via an explicit expression 
for the probability distribution over adjacency matrices 
can facilitate such analysis. The present work introduces 
a new class of probabilistic generative models. 

Good generative probabilistic models of networks 
should combine three characteristics: First, every as- 
pect of network structure that is not explicitly specified 
through the parametrization is maximally random. Sec- 
ond, they should allow for unbiased estimation of param- 
eters from data. If parameters are estimated from data, 
these data are typical for the ensemble thus parameter- 
ized. Third, they should be easy to specify and parame- 
ters should be simple to learn and interpret. Exponential 
random graph models (ERGMs), i.e. those that specify a 
Boltzmann distribution over the set of all adjacency ma- 
trices of given size, meet all of these criteria [HHH]- They 
are maximum entropy, mean unbiased and parameters 
can be learnt consistently via maximum likelihood esti- 
mators or Monte Carlo Markov Chain (MCMC) methods 

umsj. 

Generally, pairwise relations between nodes, so-called 
dyads, are considered the fundamental building blocks of 
complex networks and hence also the fundamental unit 
when modeling a network, regardless whether by growth 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 



FIG. 1. All 16 possible non-isomorphic triadic subgraphs 
(subgraph patterns) in directed networks. 



or probabilistic models. Erdos-Renyi (ER) graphs, the 
configuration model [111 [TT] , stochastic block models \TW- 
W] and degree corrected block models [22 all fall into the 
class of dyadic models. The basic assumption underlying 
dyadic models is that dyads are conditionally independent 
given the model's parameters. 

However, the assumption of dyadic independence as a 
general paradigm of network modeling seems question- 
able. For example, in a social context, the idea that 
the relation of Alice and Bob be independent from the 
relation of Alice and Charlie seems to go against experi- 
ence, especially if the relation is of romantic type. Sim- 
ilarly, triadic closure, or the large clustering coefficient 
observed in many networks, hints at a dependence be- 
tween the connections in a network. Generalizing these 
ideas, during the last decade the systematic study of third 
and fourth order sub-network structure captured high at- 
tention p5H?f] . Apart from node permutations, there 
are 16 distinct triad patterns in directed unweighted net- 
works as shown in Fig. [T] It was found that certain pat- 
terns of three-node subgraphs occur significantly more 
frequent than expected in an ensemble of networks gen- 
erated by shuffling the connections of the original net- 
work under the constraint of preserving the nodes' in- 
and out-degrees. 

Sampling from an ensemble of randomized networks 
yields an average occurrence (A^rand.i) and a standard 
deviation (Trand.i for each triad pattern i shown in Fig. IT] 
Over- and underrepresentation of pattern i is quantified 
by through a Z-score 
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Notice that Z-scores are evaluated by counting the sub- 
graph patterns over all (g) possible triads. Every net- 
work can be assigned a vector Z whose components com- 
prise the Z-scores of all possible triad patterns. Signifi- 
cant patterns are referred to as 'motifs' [53]. It is com- 
mon to consider only the Z-scores of the triad patterns 
in which all three nodes are attached to an edge. Fur- 
ther, one commonly refers to the normalized Z-vector as 

the 'significance profile', SP = -^/v Xli=i Zf- This nor- 
malization makes systems of different sizes comparable 
[23j . Many real- world systems have been examined with 
respect to their triadic Z-scores and significance profiles 
[^ IMl E5H5n] and it was suggested that they can be 
grouped into so-called 'super-famihes' fM\ . 

Surprisingly, to date, no general model exists that can 
fully explain or model the triad significance profiles ob- 



served in many real world networks. The present work 
suggests a generative probabilistic model capable of de- 
scribing a wide range of significance profiles. 

A number of growth models exist which are capable 
of reproducing certain parts of the motif statistics, in 
particular, the fraction of closed triangles by explicitly 
formulating 'triadic closure' processes. Starting from an 
initially unclustered network, one searches for edges with 
a common neighbor and then connects them successively 
to form triangles f5TH55]. Yet, the calculation of their 
properties is limited to numerical approaches [31^. 

Further, specifying generative models has proven diffi- 
cult. Using the Strauss model TU], specified by a Hamil- 
tonian with two fields, one acting on individual links, the 
other one acting on triads of links, it is possible to gener- 
ate systems with - on average - predefined link and triad 
appearance. However, Park and Newman could show 
that the average does not describe the properties of a 
typical system generated by the model. In fact, there 
is a large degenerate phase in which most instances of 
networks tend to be either fully connected or empty |37j . 

Another alternative suggested by Newman generates 
networks in which both the number of single links, Si, of 
every node i, as well as the number of triads, ti, it par- 
ticipates in are specified initially [31 . The model yields 
networks, drawn uniformly at random from the set of 
all possible matchings of 'stubs' and 'corners'. With this 
generalization of random-graph models, it is possible to 
compute analytically component sizes, the existence and 
size of a giant component, and percolation properties. 
The model yields an unbiased ensemble of networks with 
clustering. However, attempting to specify the probabil- 
ities for all possible three-node subgraphs simultaneously 
poses a problem. 

Alternatively, it has been noted early on that latent 
variables might offer an explanation for the observe mo- 
tif distributions within the framework of dyadic indepen- 
dence models. The randomization employed in the cal- 
culation of the Z-scores ignores all mesoscopic structure, 
possibly present in the system. Thus, parts of the over- 
and underrepresentations of certain motifs, compared to 
the randomized versions, may stem from such structure 
[55H40J . E.g. some features of the significance profile 
of the neural network of C. elegans could successfully be 
explained by means of latent class structure, while ac- 
counting for both properties on the individual node level 
and on the group level [.38^ . The abundance of triad mo- 
tifs is apparently strongly related to mesoscopic network 
structure or, in other words, comparison of a network 
with block structure to a null model which does not ac- 
count for such groups may result in Z-scores which are 
more than less artifacts of the mesoscopic structure. Yet, 
mesoscopic block models alone are not sufficient to ex- 
plain all observed motifs. 

In general, when trying to reproduce triad structures, 
models formulated in terms of dyads face the difficulty 
that each dyad inffuences an extensive number of triads. 
On the other hand, directly modeling all triad structures 



is impossible, as not all local triad configurations may be 
specified independently from each other. Yet, the Z-score 
statistics are obtained by considering every individual tri- 
adic subgraph pattern. 

In the following section we will suggest a model which 
is based on triads rather than dyads which actually can 
be specified independently from each other, so-called 
Steiner Triple Systems (STSs). Starting from the frame- 
work of Steiner Triple Systems, it will be possible to de- 
fine a whole class of triadic exponential random graph 
models. In this paper we discuss the most basic of such 
models, one which assumes the same probability distri- 
bution of triadic subgraph configurations on all Steiner 
Triples (STs). This can be considered the triadic analo- 
gon to ER graphs on dyadic models. We will investigate 
how a distribution on the STs affects the correpsond- 
ing triad significance profiles. With this work, we will 
be able to investigate correlations in the abundance of 
triad patterns which occur solely for statistical reasons. 
Moreover, we provide for a class of generative models 
which are capable of modeling structure of higher than 
dyadic order. We aim to design ensembles of networks 
with pre-defined Z-score profiles. In section O we will 
introduce the concept of STSs, subsequently in section 
|III| we will define the triadic random graph model, a gen- 
erative model based on STs. Finally, in section |IV| we 
will present results for the latter. In particular, we will 
show that triadic random graphs are capable of generat- 
ing networks with non-vanishing Z-scores. Furthermore, 
we will investigate correlations in the appearance of tri- 
adic subgraph patterns and discuss their implications for 
the functional interpretation of motif significance pro- 
files. Finally, we will calculate the degree distribution of 
triadic random graphs analytically. 



II. STEINER TRIPLE SYSTEMS 

We will now define the terminology used throughout 
the remainder of this article: A dyad is a set of two 
nodes. An edge, or interchangeably a link, describes the 
presence of a dyadic connection, i.e. a connection be- 
tween two nodes; it can be uni- or bidirectional. A triad 
is a set of three nodes. A triangle denotes three mutually 
interconnected nodes. A subgraph is a part of a network 
which considers only a subset of all nodes, including their 
mutual connections. A subgraph configuration is a speci- 
fication of the connections in a subgraph, while account- 
ing for node identities; e.g. dyad configuration A — > _B is 
distinct from dyad configuration _B •<— A. Subgraph pat- 
terns are sets of nodes including their relations without 
accounting for node identities, i.e. isomorphic subgraph 
configurations arc mapped to the same subgraph pattern; 
e.g. dyad pattern A ^ B is the same as pattern A ^ B. 
A(n) (anti-)motif is a subgraph pattern which is signif- 
icantly over- (under-) represented, as compared to some 
null model. 

In a network of N nodes there are T = (g) distinct 




FIG. 2. (Color online) Only few triad configurations can be 
specified independently of eacli otlier: e.g. a specification of 
tlie triads (1,2,3), (1,4,5), and (2,4,6) fully determines the 
configuration of (1,2,4). 



triads. Yet, it is not possible to specify all their triadic- 
subgraph configurations independently of each other; e.g. 
consider the network in Fig. |2] Suppose we set the re- 
lations in the three-node subgraph of nodes 1,2, and 3, 
denoted as (1, 2, 3), such that they adopt pattern A. Fur- 
ther, we specify the triads (1, 4, 5) and (4, 6, 2) such that 
they assume patterns A and A, respectively. Then, with 
the choices for the discussed three triads in Fig. [2] the 
subgraph of (4,1,2) is already determined to take the 
pattern A implicitly. This is because (4,1,2) contains 
dyadic relations which have already been assigned in the 
other three triads. 

Since there are only E = ( 2 ) dyads in a network 
and every triad comprises three dyadic relations, there 
is an upper bound to the number of triads which are 
dyad-disjoint and therefore can be set without over- 
determining the system: 

# of dyad-disjoint triads < ^ = — —^ < T (2) 

Networks for which the upper bound is exactly met can 
be partitioned into triads such that every pair of nodes 
in the system is part of exactly one of them. Such sys- 
tems are called Steiner Triple Systems (STS) ^IT]. STSs 
consisting of N vertices are called Steiner Triple Systems 
of order N, or STS(A^). There are two neccessary and 
sufficient requirements for the existence of an STS(A'^): 



Armod2=: 1 
N{N-1) mod3 = 



(3) 
(4) 



For a detailed discussion see e.g. |52l page 277ff] or [33J 
page 205ff]. The problem was originally solved by Kirk- 
man in 1847 [?T. 

From Eq. ^ and Eq. Q we can conclude that, by 
approximation, systems of arbitrary size can be decom- 
posed into Steiner Triples. All one has to do is either add 
up to three 'dummy' nodes to the system, or to ignore 
up to three nodes including their relations. 

To fix ideas. Fig. |3] shows the partition of a STS of 
order 7 into STs. Due to the small amount of vertices it 
is possible to derive the STS deductively: Without loss 
of generality, we start with node 1 . Since 1 is part of six 
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FIG. 3. (Color online) Schematic presentation of a Steiner 
Triple System of order seven. The Steiner Triples are set to be 
(1,2,3), (1,4,5), (1,6,7), (2,4,6), (2,5,7), (3,4,7), and (3,5,6), as 
indicated by the colors of the matrix elements. Every matrix 
element is assigned to exactly one Steiner Triple. 



(1.2,3) 
(1,2,4) 
(1,2,5) 
(1,2,6) 
(1,2,7) 
(1,3,4) 
(1,3,5) 



3,6) 
3,7) 



(1, 

(1, 

(1,4,5) 
4,6) 
4,7) 
5,6) 
5,7) 



(1, 

(1, 
(1, 
(1, 



(1,6,7) 
(2,3,4) 
(2,3,5) 
(2,3,6) 
(2,3,7) 
(2,4,5) 
(2,4,6) 



(2,4,7) 
(2,5,6) 

(2. .5, 71 
(2,6,7) 
(3,4,5) 
(3,4,6) 
(3,4,7) 



(3,5,6) 
(3,5,7) 
(3,6,7) 
(4,5,6) 
(4,5,7) 
(4,6,7) 
(5,6,7) 



TABLE I. Possible specification of Steiner Triples for a system 
of size 7. 



dyads, one with every remaining node, it has to be part 
of three Steiner Triples. The first one shall be (1,2,3) 
(color coded in yellow in Fig. jsl, the second one (1,4,5) 
(red), and the third one (1,6,7) (cyan). Now each dyadic 
relation 1 participates in is covered by exactly one Steiner 
Triple. We continue with the dyads of node 2: those with 
nodes 1 and 3 are already contained in (1, 2, 3). 4 and 5 
are already part of ST (1,4,5) and therefore need to be 
assigned to distinct ST. We choose 6 to be in the ST with 
2 and 4 (blue) , and thus we have also specified ST (1, 5, 7) 
(green). Continuing with node 3, the dyads with nodes 
4, 5, 6, and 7 need to be assigned to ST. 4 is already 
assigned to STs with 5 and 6. Thus, the two remaining 
ST are (3,4,7) (magenta) and (3,5,6) (orange). 

From the (g) — 35 possible triads of a network of order 
seven only E/3 = 7 • 6/6 = 7 triads can be specified 
independently from each other. 

In table |T] all triads for a network of seven nodes are 
displayed. A possible choice of an STS is highlited with 
colors corresponding to Fig. [3 

A detailed proof that Eq. ( 3| and Eq. ^ are indeed 
sufficient for the existence of an STS can be found in [15] . 

Of course, for larger system sizes it is not practical 
to construct STSs the way described above. However, 
larger STS can be constructed by merging smaller ones. 
For the STS(7), the partition described above is unique, 
apart from relabeling nodes. For STSs of higher order, 
there are multiple non-isomorphic ways to partition the 
nodes into Steiner Triples. STSs provide us with sets of 
triads which can actually be configured without overde- 
termining dyadic relations. They thus, can be considered 
a basis to express an adjacency matrix. 

In order to account for substructures of higher than 
dyadic order, our goal is now to define a model based on 



triadic rather than dyadic entities. Since Steiner Triple 
Systems assign every dyadic relation, i.e. every pair of 
nodes, to exactly one triad, the specification of the con- 
figurations of all Steiner Triples is equivalent to specify- 
ing an adjacency matrix A. To convince oneself that a 
formulation of a network in terms of Steiner Triples is 
equivalent to a formulation in terms of dyads, consider 
a directed unweighted graph with N vertices. There are 
(2) dyads. Each dyad {i,j) may adopt four distinct con- 
figurations. Thus, in total there are 4(2)= 2^( 2 j possible 
states of the system, i.e. distinct adjacency matrices. On 
the other hand, there are {2)/'^ distinct Steiner Triples. 
Each of those triads may assume 2^ = 64 distinct config- 
urations (every of the six unidirectional links in the triad 
may be present or absent). Therefore, again we obtain 

64(2)/^ = 2^(")/^ = 2^(2) possible states. The argument 
for undirected graphs is analogous. 

III. MODEL 

Let us recall that dyadic ERGMs assume that the like- 
lihoods for the presence of two edges are conditionally 
independent of each other. Further, let the matrix D 
with components Dij € {0,1} denote the random vari- 
ables corresponding to the entries of the adjacency matrix 
A. Then, the independence assumption implies for the 
likelihood of observing an adjacency matrix A 

N-l N 

i=l j=i+l 
N-1 N 

= n n ^ (^(,:.,) = If,.,-) 

(5) 

where 9 includes all parameters of the model. The vector 
notation on the right hand side accounts for the fact that 
in directed unweighted networks, there are four possible 
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be combined in a four dimensional indicator vector Auj\ 
with all components being zero, except for one being one. 
We will now employ the concept of Steiner Triple Sys- 
tems to define the triadic analogon to Eq. (l5|. Now, 
instead of assuming the likelihoods of dyads to be condi- 
tionally independent of each other, we suppose the likeli- 
hoods for the configurations on Steiner Triples to be con- 
ditionally independent. With this asumption, the likeli- 
hood of observing an adjacency matrix A factorizes as 
follows: 
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FIG. 4. The three isomorphic configurations belonging to 
pattern 4. 



where a denotes the Steiner Triples of an STS(Af), D„ 
is an indicator variable for the configuration of Steiner 
Triple cr, and A„ is a value of this variable. Analogously 
to Eq. (Isl , for each of the vectors exactly one component 
is unity, while all others are zero, which is equivalent to 
the fact that a triad cannot be in multiple configurations 
at the same time. For undirected networks it is D^, G 
{0,1} , for directed ones it is D„ G {0, 1} . Accordingly, 

it is^(^^l^) e [0,1]^ or [0,1]^'* 



respectively with the 
sums of the elements normalized to one. By defining 
Eq. (IgI) we make the assumption that the likelihoods 
of Steiner Triple configurations factorize, i.e. they are 



conditionally independent of each other. 

For unweighted graphs, Eq. ^ describes the most 
general formulation of models based on conditionally in- 
dependent STs. We will now further investigate the prop- 
erties of a particular realization of this class of models. 
The simplest such model has the same likelihood distri- 
bution for the triad configurations on all Steiner Triples, 

V {pa\^ =v{p\d\. This can be regarded the triadic 
analogon to dyadic ER graphs, in which the likelihood 
for the existence of an edge is the same for all dyads. 
We will refer to them as triadic random graphs. Since 
the ordering of the nodes in a Steiner Triple is arbitrary 
there is no need to distinguish between isomorphic triad 
configurations. E.g. the likelihoods of the three configu- 
rations of subgraph 4, shown in Fig. |4] will be the same. 
Thus, the triadic random graphs have 16 parameters each 
of them indicating the probability of a Steiner Triple to 
assume one of the subgraphs shown in Fig. ^ Of course, 
their values need to sum up to unity. 

Given the parameters, the probability disitribution of 
each Steiner Triple is given by: 



7^ £'16' = MV 



= M (p( ) , p(/ ) , p{/ ) , p{/\) , p(A) , p(A) , p{S) , p(A) , p(A) , p(A) , p(A) , p(A) , p(A) , p(A) , p(A) , p(A) ) ' 



(7) 



The matrix M maps each of the 16 non-isomorphic sub- 
graph patterns in Fig. [T] to their corresponding isomor- 
phic configurations with equal probability, i.e. the sums 
of its columns are normalized to one. Here, the only pa- 
rameters 9 of the model are the entries of the vector V. 

Eq. (l6| and Eq. (It]) describe the triadic random graph 
model, in which the configuration for each Steiner Triple 
is drawn - conditionally independent from other Steiner 
Triples - from the same probability distribution over the 
16 subgraphs shown in Fig. [T] 

If (unidirectional) links are set purely at random with 
probability p as it is the case in ER graphs, the proba- 
bilities for the triadic subgraph patterns are: 
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The triadic random graph model allows us to deviate 
from this probability distribution. Therefore, we can en- 
hance or suppress certain substructures as compared to 



r 



ER graphs. 



IV. RESULTS 



A. Z-score profiles 



In order to examine the impact of the triad distribution 
for the STS on the Z-score profile of the total network 
we did extensive samplings on the 16-dimensional sim- 
plex defined by the probability distributions ([t]). Sam- 
plings were performed for both systems of size 49 and 63. 
For the computation of the Z-score profiles we used the 
mfinder software (version 1.2) [45^ and averaged the Z- 
score for each vector V over multiple samples. It shall be 
noticed that the software only considers those triad con- 
figurations which have all three nodes attached to at least 
one edge. Thus, there are no Z-scores for the subgraphs 
1 ( ), 2 (/ ), and 3 (/ ) of Fig. [I] Yet, of course it is 
neccessary to account for them in the input distributions 
for the STS. 

Fig. [5] displays exemplary results obtained from the 
sampling. Plots a, b, and c show the distributions 
imposed on the STS (blue circles). This distribution 
already determines the expectation value for the link- 
density of the network. E.g. suppose 60% of the Steiner 
Triples adopt pattern A (which has two of the six pos- 
sible links being set) and 40% adopt A (five of the 
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FIG. 5. (Color online) Top: Distribution of triad configurations for the Steiner Triples (blue circles) and expected distribution 
of triad configurations for ER graphs with the same link density (red squares). Bottom: Z-scores obtained from networks 
sampled from the distributions above for systems of size A'' = 49 (blue circles) and A'^ = 63 (violet squares), averaged over 15 
sample networks. 



aH 






J 


<an 


do] 


TIS 


am 


3le; 


i 








■ 


■ 


■ 


■ 


■ 


■ 




■ 


■ 


■ 






A 


■■ 




■ 




■ 






■ 


■ 








A 


■ 


■ 


■ 


■ 


■ 




■ 


■ 


■ 


■ 


■ 


■ 


^ 


■ 




• ■■ 


■ 


■ 




■ 


■ 








A 


■ 




■ 


■P- 


■ 




■ 










A 


■ 


■ 


■ 


■ 


■■■ 


■ 


■ 


■ 




■ 




A 


■ 






■ 


■ 


■■- 


■ 


■ 


■ 


■ 


■ 


A 






■ 








-■ 






■ 


■ 


A 


■ 


■ 


■ 


■ 


■ 


■ 




■ ■ 








A 


■ 


■ 


■ 


■ 




■ 


■ 




-■• 


■ 




A 


■ 


■ 


■ 


■ 


■ 


■ 


■ 


■ 




■ 


■ ■ 




A 






■ 






■ 


■ 


■ 




■ 


■ 


■ ■ 


A 






■ 








■ 


■ 








■ 


■ 



Real data 



A 
A 
A 
A 
A 
A 
A 
A 
A 
A 
A 
A 
A 



^n PI 1 ■ 1 1 1 1 


D nrnDDnDTTTTi 




■ ^■■1 


DC JDDW 1 1 1 1 


ZE ni 1 ■ 1 1 n 


zn zn mm 




IHIHIHI 


ZL JZ 






■^■■1 


zn zn 






n 


■ 




■z 




1 


■ 


■■■■ 


^ 




' 






D _: ■ 








ID^IIDS- 




■ ■■■ 



AAA^AAAAAAAAA 



AAA^AAAAAAAAA 



FIG. 6. (Color online) Left: Pattern cross correlations in 2 x 10* randomly sampled distributions on Steiner Triple Systems. 
Right: Correlations obtained from real data sets (Table III. The length of the squares indicates the magnitude (0 to 1). Black 
and red shading corresponds to positive and negative values, respectively. Shown are significant entries at a level of 5%. 



six links being set) then the expected density will be 
p = (0.6 • 2 + 0.4 • 5)/6 « 53%. For comparison we also 
plot the distribution one would expect on the STs for a 
dyadic ER graph with the same link density as given by 
Eq. (|8| (red squares, dashed line in Fig. Isl). 

Plots d, e, and fin Fig.|5]show the Z-score profiles ob- 
tained from the input distributions above for networks of 
size 49 (blue circles) and 63 (violet squares). Displayed 
are the mean values averaged over 15 samples for each 
distribution. For systems with no higher order structure, 
such as ER graphs, all Z-scores are expected to vanish. 
However, for the triadic random graph model, we ob- 
serve Z-scores with magnitudes larger than five, implying 
that certain motifs appear five standard deviations more 
frequently than expected for the randomized ensemble. 
Thus, triadic random graphs are capable of modelling 
structure of higher than dyadic order. It shall be em- 
phasized that this higher order structure does not stem 
from mesoscopic group structure; all Steiner Triples, and 
therefore all nodes, have the same parameters. In accor- 
dance with the literature [23] a larger system size results 
in a larger magnitude of the Z-scores. However, the shape 
of the Z-score profiles is size independent. 



B. Z-score correlations 

For the interpretation of triad significance profiles ob- 
served in real networks it is important to be aware of 
correlations between the Z-scores of pairs of triad pat- 
terns, which inherently already arise solely for statistical 
reasons. 

We did extensive uniform sampling of the 16- 
dimensional simplex spanned by the parameter space of 
the triadic random graph model (Q. In fact, we sam- 
pled more than 2 x lO'* distinct distributions. For each 
of the distributions, we generated five network instances 
and we evaluated the average Z-score profiles. Using the 
latter, we can evaluate cross correlations between pairs 
of Z-scores over the input distributions sampled. For two 
patterns, i and j, it is 



{Zi Zj) — {Zi 



(Z. 



(9) 



The averages are taken over all sampled STS distribu- 
tions considered for the evaluation of the correlation ma- 
trix. The statistical significance of the correlation is 
tested by means of a t-test. 

Fig. p\ (a) shows the correlation matrix between pairs 
of Z-scores when sampling randomly. Considered are sig- 
nificant correlations at a level of 5%. The side lengths 
of the squares indicate the magnitudes of the correlation 
coefficients between the corresponding subgraphs. Posi- 
tive values are colored in black, negative ones in red. The 
magnitude (zero to one) is proportional to the length of 
the squares. One can clearly see that certain Z-scores are 
strongly anti-correlated with each other while others are 
positively correlated. To keep track of the impact of the 



link-density on potential correlations, the distributions 
are grouped in bins of width 0.05. We evaluated seperate 
correlation matrices for each of the link-density ranges. 
It turns out that correlations and anti-correlations occur 
consistently between the same sets of triad patterns for 
all link-densities sampled. 



In order to distinguish between Z-scores which actu- 
ally describe characteristics of the networks from purely 
statistical artifacts we also investigated Z-score correla- 
tions over various real-world networks. Fig. p^ (b) shows 
the correlation matrix obtained from the 16 real-world 
data sets shown in table |IT1 We observe that the most 
pronounced correlations found in the ensemble of triadic 
random graphs also appear in the real data sets. This 
implies that the expressive power of single (anti" ) motifs 
might be limited and one should rather consider the Z- 



score profile in whole. Table III displays the ten strongest 
cross correlations between pairs of triadic subgraph pat- 
terns which were found in our random samples of the 
triadic random graph ensemble together with the corre- 
lation coefficients found in the real data for the respective 
pairs of triad patterns. Apparently, nine of the top ten 
(anti" ) correlations of the statistical data are also found 
in the real systems. However, not all entries of correla- 
tion matrix obtained from the triadic random graphs are 
reffected in Fig. ^ b): e.g. patterns A and A are anti- 
correlated in the random ensemble, while being strongly 
positively correlated in the real data. This gives rise to 
the conjecture that this correlation captures valuable in- 
formation about the systems' structure. Contrary, e.g. 
the correlation between patterns A and ^ seems to stem 
from statistical roots. 



Investigations of correlations in the appearance of sub- 
graph motifs have been done before by Ginoza et al. [52] . 
Yet, their work focuses on correlations within the ran- 
domization process of single networks. They consider 
motifs in two particular networks, namely for the tran- 
scriptional regulatory networks of E. coli and S. cere- 
visiae. One of their key results is that the abundances of 
patterns A, A, and ^ are strongly mutually correlated, 
while being anti-correlated with pattern A. Our ap- 
proach however considers correlations which appear over 
multiple network instances and is therefore complemen- 
tary to the one in [52,. Again, Fig. [6] (a), displays our ob- 
served correlations between subgraph patterns which oc- 
cur solely for statistical reasons. In accordance with Gi- 
noza et al. we find strong correlations between patterns 
A and A, as well as strong anti-correlation of them with 
A. However, the former are hardly correlated with pat- 
tern A (in fact, the correlation coefficient is even slightly 
negative). Although, doubtlessly, in most real networks 
there is a strong mutual (anti-)correlation in the abun- 
dance of subgraphs A, A, A, and A, our results indicate 
that they do not neccessariliy follow for statistical rea- 
sons. 



Data set 
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/ A 


A 


Js 


A 


A 
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A 


A 


A 


A 


A 


A 


C. Elegans [46H18] 


-16.5 


-6.29 


-24.23 


-11.99 


12.43 


24.48 


-27.02 


-16.3 


-5.02 


2.59 


27.29 


13.15 


9.64 


Political blogs ^ Hg] 


-76.09 


-51.28 


-49.36 


-58.19 


55.26 


40.28 


-54.07 


-31.17 


-2.32 


2.97 


47.19 


27.05 


24.82 


E. coii(v. 1.1) gniiMi 


-12.23 


-12.23 





-12.23 


12.23 


























English book m ia 


26.09 


13.58 


14.35 


22.8 


-22.52 


-10 


24.67 


13.51 


-1.39 


-6.59 


-21.84 


-13.58 


-5.53 


French book [Ml HS] 


31.51 


26.31 


13.4 


31.52 


-29.1 


-10.16 


16.17 


12.03 


-11.5 


-12.34 


-15.07 


-12.33 


-4.72 


Japanese book [231 H5] 


15.01 


12.05 


13.43 


14.97 


-14.39 


-7.94 


12.13 


9.27 


-4.76 


-9.92 


-7.4 


-8.3 


-3.07 


Spanish book [21 113 


26.58 


27.5 


13.57 


23.77 


-22.3 


-4.16 


29.35 


12.39 


-13.22 


-19.82 


-25.22 


-10.99 


-7.57 


leader2Inter [21 [45] 


-2.25 


-1.2 


-2.58 


-1.22 


0.81 


1.33 


-3.24 


-4.5 


0.38 


1.15 


2.31 


1.8 


3.53 


prisonlnter [51 HS] 


-6.06 


-3.71 


-10.14 


-9.06 


4.31 


7.84 


-8.26 


-13.83 


0.4 


1.99 


5.42 


7.49 


11.93 


Electr. circ. (s208) |45| 


1.63 


-9.57 





1.63 


-1.63 











11.01 














Electr. circ. (s420) [45] 


1.61 


-17.21 





1.61 


-1.61 











20.74 














S. Cerevisiae [13 [H] 


-13.73 


-13.52 


-0.96 


-13.66 


13.6 


-0.35 


-5.91 





-0.17 


9.9 


3.94 









TABLE II. Z-scores observed in real-world data sets. 



Rank 


patterns 


random samples 


real data 


1 


A, A 


-0.780487 


-0.527285 


2 


A, A 


-0.74238 


-0.977566 


3 


A, A 


-0.732722 


-0.998263 


4 


A, A 


-0.730034 


-0.989075 


5 


A, A 


-0.661864 


-0.985936 


6 


A, A 


-0.661757 


-0.996671 


7 


A, A 


0.578477 


0.990511 


8 


A, A 


-0.562423 


0.958279 


9 


A, A 


-0.49549 


-0.703186 


10 


A, A 


-0.488278 


-0.835307 



TABLE III. Top 10 (anti-) correlations between subgraph pat- 
terns found in the synthetic random samples, as well as the 
corresponding correlations observed in real-world data sets. 



C. Degree distributions 

An important characteristic of complex networks is 
their degree distribution. 



In dyadic ER graphs, the node degrees are expected to 
be Poissonian distributed: 



7? (fc = k) = e 



-ik){kr 



(10) 



This holds for both in and out-degrees. 

To derive the expected in-degree distribution for tri- 
adic random graphs, consider an arbitrary node i. It is 
part of {N — l)/2 Steiner Triples. Now let Si be a ran- 
dom variable indicating the number of i's Steiner Triples 
in which a single edge is directed towards it. Further, 
be di the random variable indicating the number of its 
Steiner Triples with two links directed towards it. From 
the probabilities in Eq. (|7| we can directly infer the 
probabilities for a single ST to contribute to s^ and di, 
respectively: 



P(s0=^b(O+P(^)+P(^)+P(A)] + ^b(O+P(^)+P(^)+P(A)+P(A)+p(A)]+p(A)+p(A) 
p(d,)=^b(A)+p(A)+p(A)+p(A)+p(A)+p(A)] + ^[p(A)+p(A)]+p(A) 



(11) 



Since the model parameters are the same for all nodes, 
the expectation values for s and d will also be the same 
for all i: 



N -1 

(s) = (S,;) = — ^p(Sj) 

{d) = {d,) = ^pid.) 



(12) 



Each of the {N — l)/2 Steiner Triples of node i has ei- 
ther no, one, or two edges directed towards it. Therefore, 
the joint probability distribution of Si and di is given by 
the multinomial: 



N-1 
2 






n-s-nd 



.ris , Ud, 



N-l 
2 

N-l 



ns-ridj \N - 1 



Us+Ud 



{sr {dTd 1 



2{{s) + {d) )\^ 
N-l 



(13) 



For the second equality we used Eq. (12 1. For large sparse systems and (s) ,{d)<^N we find 



iim p 



:^)! 



>^Too(iv_i-n,-n,)! V^-1 



X 1 



is) + (rf) 

Af-l 



"s+Md 



(d) 



nj rid! 



N-l 
2 



= Iim ^ 



+ ^ + ...+ ^-r,.-n,+ l (.)"- (dp ^_^,^_ 



7V_1\ "s+Md 



(d> 



(^) 



nj n^! 



(14) 



{dY 



.-{s)-{d) 



nj nd\ 



The in-degree of node i is 



2d,; 



The probability distribution for node i to have in-degree 
K is thus 



otherwise). In the limit {d) — > the distribution is 
Poissonian. With {d) approaching ^(fc™), the distri- 
(15) bution becomes broader, implying larger deviations from 
{k) . Fig. It] shows distributions of Eq. ( 16 1 with fixed 
{k) = (s) + 2 (d) = 100 for various ratios of r = (s) / (d) 
together with the corresponding Poissonian. 



JV-l N-l 



ris—O nd—0 



'{s)-{d) 



E 



^d^ = Ud 

\ K—2 n 



K,7is-\-2nd 



(16) 



' {dY 



"d=0 



-^ (k- 2nd)! nd\ 



The out-degree distribution can be derived analo- 
gously. In this case, only the probabilites for the triads 
with a single out-going edge, p(s°"'), and for two out- 



where d is the Kronecker delta (Sij — I if i — j, going edges, p(d°"'), need to be adjusted accordingly. 



p(sr*)=^b(0+P(^)+p(A)+p(A)] + ^[p('')+P(N+p(^)+p(A)+p(A)+p(A)]+p(A)+p(A) 
P{dr') ^l b(^) +Pi^) +Pi^) +P(A) +p(A) +p(A)] + ^ [p(A) +p(A)] +p(A) 



(17) 



D. Design of significance profiles 



To achieve the goal of designing networks with certain 
triad significance profiles, it is important to understand 
the relationship between the distribution of triad con- 



figurations on Steiner Triples and the Z-scores obtained 
from their ensembles. Therefore, we also investigated the 
cross correlations between the ST configurations and the 
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FIG. 7. (Color online) Degree distributions for mean degree 
(k) = 100 and various ratios r = (s) / (d). 
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FIG. 8. (Color online) Correlation matrix between triad con- 
figurations on ST and the resulting Z-score profiles obtained 
from 5000 configurations. The length of the squares indicates 
the magnitude (0 to 1). Black and red shading corresponds 
to positive and negative values, respectively. 



obtained corresponding Z-scores: 



C- 



Vi,Zj 



{va.i)-{v,){z,) ^ 



(18) 



Results are presented in Fig. [8] Of course, there is a 
strong correlation between the imposed triad patterns 
on the Steiner Triples and the Z-scores of these patterns. 
However, as for the Z-score-Z-score cross correlations, 
again we observe strong anti-correlations between cer- 
tain patterns. As before, the observations are valid for 
all examined link densities. Correlations between the in- 
put distributions on the STS and the obtained over-all 
Z-score profiles can be helpful in designing systems with 
pre-defined significance profiles. 

For a simplistic approach, we assume a linear relation 
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FIG. 9. (Color online) Significance profiles corresponding to 
the Z-scores in Fig. Islfor systems of size 49 (blue circles). The 
violet squares indicate the prediction obtained from the input 
distribution V by assuming SP (x CV. 

between the input distribution V and the significance 
profile, conveyed by the correlation matrix C (Fig. ^, 



SPcxCP. 



(19) 



In order to design systems with pre-defined significance 
profiles, it is neccessary to map the latter to a correspond- 
ing input distribution, which can be realized by means of 

— 1 
the pseudo inverse matrix C 



VozC ^SP 



(20) 
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FIG. 10. (Color online) Attempt to model the significance 
profile indicated by the violet squares. However, networks 
with their parametrization obtained from Eq. ( 20 1 yield very 
different significance profiles. 



Fig. [9] shows the significance profiles corresponding to 



the Z-scores in Fig 
tained from Eq, 



(191 



5] together with the prediction ob- 
The predictions agree very well 
with the actually observed profiles. However, attempts 
to model arbitrary significance profile with the linear re- 
lation Eq. (20 1 will not succeed in all cases as shown in 
Fig. [101 

This may be for various reasons. On the one hand the 
relationship between V and the significance profile is cer- 
tainly not entirely linear. Secondly, not all significance 
profiles are necessarily realizable, e.g. think of a SP with 
all patterns being overrepresented. Furthermore, the tri- 
adic random graph model describes the most simplistic 
model based on STS, which, e.g. does not account for in- 
dividual node properties. This is also refiected in the fact 
that the degree distributions of triadic random graphs 
are close to a Poissonian. A formulation of more specific 
models based on STS may overcome these shortcomings. 
Still, these first steps open the way to efficiently generate 
networks in which certain motifs are over- or underrep- 
resented and thus enable systematic investigations of the 
functional significance of these motifs. 



V. CONCLUSIONS 

Over the last decade the over- and underrepresentation 
of particular sub-network patterns has attracted high 



attention. This led to the hypothesis that, instead of 
links, they serve as the building blocks of network struc- 
tures [23]. The fact that only a small portion of triad 
configurations can actually be specified independently 
poses a challange to the formulation of generative mod- 
els which account for higher order substructures. Based 
on sets of pair-disjoint triads, so called Steiner Triple 
Systems, we have introduced a novel class of generative 
models. The most simple realisation of such models as- 
sumes the same probability distribution over the possible 
triad patterns for all Steiner Triples in the system. We 
referred to them as triadic random graph models. By 
extensive samplings we proved that, in constrast to ER 
graphs, even this most simplistic model is capable of in- 
ducing non- vanishing Z-scores. Furthermore, we discov- 
ered inevitable correlations between triad patterns whith 
respect to their abundance. These occur solely for sta- 
tistical reasons. This dependence in the appearance of 
subgraph patterns should be taken into account when at- 
tributing functional relevance to network motifs. More- 
over, we unveiled correlations between the probability 
distributions on the Steiner Triples and the observed Z- 
score profiles over the whole network. These are helpful 
for designing ensembles of networks with pre-defined sig- 
nificance profiles which can facilitate a systematic study 
of the effect of motif distributions on network dynamics. 
Finally, we could also calculate the degree distributions 
of triadic random graphs analytically. We found it to 
be similar, yet not identical to a Poissionian. Depending 
on the input distribution V, the degree distribution is 
broader than a Poissonian. 

The triadic random graph model assumes all nodes to 
be equal and thus can be considered the triadic analogon 
to ER graphs. However, in many real-world systems, in- 
dividual node properties like the popularity or activity 
of vertices play a crucial role. Future models based on 
Steiner Triple Systems which, e.g. aim to predict hith- 
erto undiscovered links may include those parameters in 
Eq. (Ig]). In addition, the introduced framework also al- 
lows for the definition of models for signed networks, 
i.e. graphs with positive or negative edges which play 
in important role in the social sciences in the context of 
structural balance theory [S3J |21j as well as in the bio- 
sciences where they are used to model excitatory and in- 
hibitory links in neural or gene-regulation networks |55j . 
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